-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-14353] Dataset Time Window window API for Python, and SQL
#12136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #54801 has finished for PR 12136 at commit
|
|
Can we leave all the streaming stuff out of R for now? I don't know whether it makes sense to provide this functionality in R, and even if we do, the current API might change a lot, so having it in fewer languages would make it easier to change. |
|
Ah OK this "window" thing is not streaming specific. I'd still consider breaking them into two separate prs so we can get the python thing in sooner. |
|
Test build #54807 has finished for PR 12136 at commit
|
window API for Python, SQL, and Rwindow API for Python, and SQL
|
Test build #54820 has finished for PR 12136 at commit
|
| } else { | ||
| newArgs ++ otherCopyArgs | ||
| } | ||
| val defaultCtor = ctors.find { ctor => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a smarter way to pick the constructor, or else the two constructors for TimeWindow cause issues depending on Java version
|
Test build #54906 has finished for PR 12136 at commit
|
|
@davies Addressed your comments |
|
Test build #54912 has finished for PR 12136 at commit
|
|
Test build #54913 has finished for PR 12136 at commit
|
|
Test build #54933 has finished for PR 12136 at commit
|
|
Test build #54952 has finished for PR 12136 at commit
|
|
Test build #54961 has finished for PR 12136 at commit
|
|
Test build #54999 has finished for PR 12136 at commit
|
|
Test build #55008 has finished for PR 12136 at commit
|
|
LGTM, merging this into master, thanks! |
…n Exception of Function Lookup
### What changes were proposed in this pull request?
When the exception is an invocation exception during function lookup, we return a useless/confusing error message:
For example,
```Scala
df.selectExpr("concat_ws()")
```
Below is the error message we got:
```
null; line 1 pos 0
org.apache.spark.sql.AnalysisException: null; line 1 pos 0
```
To get the meaningful error message, we need to get the cause. The fix is exactly the same as what we did in #12136. After the fix, the message we got is the exception issued in the constuctor of function implementation:
```
requirement failed: concat_ws requires at least one argument.; line 1 pos 0
org.apache.spark.sql.AnalysisException: requirement failed: concat_ws requires at least one argument.; line 1 pos 0
```
### How was this patch tested?
Added test cases.
Author: gatorsmile <gatorsmile@gmail.com>
Closes #15878 from gatorsmile/functionNotFound.
(cherry picked from commit 86430cc)
Signed-off-by: Reynold Xin <rxin@databricks.com>
…n Exception of Function Lookup
### What changes were proposed in this pull request?
When the exception is an invocation exception during function lookup, we return a useless/confusing error message:
For example,
```Scala
df.selectExpr("concat_ws()")
```
Below is the error message we got:
```
null; line 1 pos 0
org.apache.spark.sql.AnalysisException: null; line 1 pos 0
```
To get the meaningful error message, we need to get the cause. The fix is exactly the same as what we did in #12136. After the fix, the message we got is the exception issued in the constuctor of function implementation:
```
requirement failed: concat_ws requires at least one argument.; line 1 pos 0
org.apache.spark.sql.AnalysisException: requirement failed: concat_ws requires at least one argument.; line 1 pos 0
```
### How was this patch tested?
Added test cases.
Author: gatorsmile <gatorsmile@gmail.com>
Closes #15878 from gatorsmile/functionNotFound.
…g an Invocation Exception of Function Lookup ### What changes were proposed in this pull request? This PR is to backport #15878 When the exception is an invocation exception during function lookup, we return a useless/confusing error message: For example, ```Scala df.selectExpr("concat_ws()") ``` Below is the error message we got: ``` null; line 1 pos 0 org.apache.spark.sql.AnalysisException: null; line 1 pos 0 ``` To get the meaningful error message, we need to get the cause. The fix is exactly the same as what we did in #12136. After the fix, the message we got is the exception issued in the constuctor of function implementation: ``` requirement failed: concat_ws requires at least one argument.; line 1 pos 0 org.apache.spark.sql.AnalysisException: requirement failed: concat_ws requires at least one argument.; line 1 pos 0 ``` ### How was this patch tested? Added test cases. Author: gatorsmile <gatorsmile@gmail.com> Closes #15902 from gatorsmile/functionNotFound20.
…n Exception of Function Lookup
### What changes were proposed in this pull request?
When the exception is an invocation exception during function lookup, we return a useless/confusing error message:
For example,
```Scala
df.selectExpr("concat_ws()")
```
Below is the error message we got:
```
null; line 1 pos 0
org.apache.spark.sql.AnalysisException: null; line 1 pos 0
```
To get the meaningful error message, we need to get the cause. The fix is exactly the same as what we did in apache#12136. After the fix, the message we got is the exception issued in the constuctor of function implementation:
```
requirement failed: concat_ws requires at least one argument.; line 1 pos 0
org.apache.spark.sql.AnalysisException: requirement failed: concat_ws requires at least one argument.; line 1 pos 0
```
### How was this patch tested?
Added test cases.
Author: gatorsmile <gatorsmile@gmail.com>
Closes apache#15878 from gatorsmile/functionNotFound.
What changes were proposed in this pull request?
The
windowfunction was added to Dataset with this PR.This PR adds the Python, and SQL, API for this function.
With this PR, SQL, Java, and Scala will share the same APIs as in users can use:
window(timeColumn, windowDuration)window(timeColumn, windowDuration, slideDuration)window(timeColumn, windowDuration, slideDuration, startTime)In Python, users can access all APIs above, but in addition they can do
window(timeColumn, windowDuration, startTime=...)that is, they can provide the startTime without providing the
slideDuration. In this case, we will generate tumbling windows.How was this patch tested?
Unit tests + manual tests